Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove synchronization in Reference and unused methods in ReferenceQueue #20694

Merged
merged 2 commits into from
Dec 3, 2024

Conversation

pshipton
Copy link
Member

Issue #13823

@pshipton
Copy link
Member Author

jenkins test sanity plinux jdk8

@pshipton
Copy link
Member Author

@babsingh
Copy link
Contributor

babsingh commented Nov 28, 2024

The deadlock happens because the two locks are acquired in different order in the two code paths.

20:55:13  "Finalizer thread" Id=29 BLOCKED on java.lang.ref.ReferenceQueue@49d657dd owned by "Thread-11" Id=33
20:55:13  	at java.lang.ref.ReferenceQueue.enqueue(ReferenceQueue.java:179) // Acquire ReferenceQueue's monitor
20:55:13  	at java.lang.ref.Reference.enqueueImpl(Reference.java:147) // Acquire Reference's monitor
20:55:13  


20:55:13  "Thread-11" Id=33 BLOCKED on java.util.logging.LogManager$LoggerWeakRef@596d6465 owned by "Finalizer thread" Id=29
20:55:13  	at java.lang.ref.Reference.dequeue(Reference.java:195) // Acquire Reference's monitor
20:55:13  	at java.lang.ref.ReferenceQueue.poll(ReferenceQueue.java:85) // Acquire ReferenceQueue's monitor
20:55:13  	at java.util.logging.LogManager.drainLoggerRefQueueBounded(LogManager.java:1130)
20:55:13  	at java.util.logging.LogManager.addLogger(LogManager.java:1160)
20:55:13  	at java.util.logging.LogManager.demandLogger(LogManager.java:556)
20:55:13  	at java.util.logging.Logger.demandLogger(Logger.java:455)
20:55:13  	at java.util.logging.Logger.getLogger(Logger.java:502)
20:55:13  	at TestLogConfigurationDeadLock$AddLogger.run(TestLogConfigurationDeadLock.java:175)

Removing the synchronized block in dequeue might change Reference.state while enqueue is being executed. This can lead to incorrect behavior.

@babsingh
Copy link
Contributor

babsingh commented Nov 28, 2024

The correct fix is to avoid the different order in which the two locks are acquired. Example: In ReferenceQueue, we can avoid invoking Reference.dequeue() from within the synchronized(this) block.

public Reference<? extends T> poll () {
        Reference ref;

        /* Optimization to return immediately and not synchronize if there is nothing in the queue */
        if(empty) {
                return null;
        }
        synchronized(this) {
                if(empty) {
                        return null;
                }
                ref = references[head];
                references[head++] = null;
                if(head == references.length) {
                        head = 0;
                }
                if(head == tail) {
                        empty = true;
                }
        }
        ref.dequeue(); // Moved outside the synchronized block
        return ref;
}
public Reference<? extends T> remove(long timeout) throws IllegalArgumentException, InterruptedException {
        if (timeout < 0) throw new IllegalArgumentException();

        Reference ref;
        synchronized(this) {
                if(empty) {
                        wait(timeout);
                        if(empty) return null;
                }
                ref = references[head];
                references[head++] = null;
                if(head == references.length) {
                        head = 0;
                }
        }
        ref.dequeue(); // Moved outside the synchronized block
        synchronized(this) { // Not sure if it matters to notifyAll() after invoking dequeue()
                if(head == tail) {
                        empty = true;
                } else {
                        notifyAll();
                }
        }
        return ref;
}

@dmitripivkine Does it matter if we move the dequeue operation at the end? Also, is there a requirement to notifyAll after dequeue or can we notifyAll first and then invoke dequeue?

@keithc-ca
Copy link
Contributor

correct fix is to avoid the different order in which the two locks are acquired

Because this is a public class, we don't have complete control over the synchronization order.

Removing synchronization in isEnqueued() is safe because there are no values of state that might be observed that weren't possible before.

All of the uses of dequeue() are in ReferenceQueue - none of them require synchronization within dequeue().

@babsingh
Copy link
Contributor

All of the uses of dequeue() are in ReferenceQueue - none of them require synchronization within dequeue().

Once the synchronized block in Reference.dequeue is removed, there will be data contention in updating Reference.state when ReferenceQueue.poll or ReferenceQueue.remove is invoked concurrently with Reference.enqueue.

@pshipton
Copy link
Member Author

I don't think the change to remove() is correct in #20694 (comment)

I've pushed a new commit (I can squash before merge) to ensure lock ordering and consistency between enqueue and dequeue.

@babsingh
Copy link
Contributor

babsingh commented Nov 28, 2024

I don't think the change to remove() is correct in #20694 (comment)

head and empty fields need to be updated in a single synchronized block. Spreading the operations across two synchronized blocks will lead to inconsistency in updating those fields. My initial proposal, before updating my previous comment, was the following:

public Reference<? extends T> remove(long timeout) throws IllegalArgumentException, InterruptedException {
        if (timeout < 0) throw new IllegalArgumentException();

        Reference ref;
        boolean notifyAll = true;
        synchronized(this) {
                if(empty) {
                        wait(timeout);
                        if(empty) return null;
                }
                ref = references[head];
                references[head++] = null;
                if(head == references.length) {
                        head = 0;
                }
                if(head == tail) {
                        empty = true;
                        notifyAll = false;
                }
        }
        ref.dequeue(); // Moved outside the synchronized block
        if (notifyAll) {
                synchronized(this) {
                        notifyAll();
                }
        }
        return ref;
}

@pshipton
Copy link
Member Author

jenkins test sanity plinux jdk8

@pshipton
Copy link
Member Author

Previous PR build https://openj9-jenkins.osuosl.org/job/PullRequest-OpenJ9/6597/ passed.

@pshipton
Copy link
Member Author

jenkins compile plinux jdk8

@pshipton
Copy link
Member Author

jenkins compile plinux jdk11

@pshipton
Copy link
Member Author

pshipton commented Dec 2, 2024

Just some known problems in the test results, the non-network failures passed in reruns.

@pshipton pshipton force-pushed the refqueue branch 2 times, most recently from 7ffa7c4 to 2ecb6c9 Compare December 2, 2024 15:05
@pshipton
Copy link
Member Author

pshipton commented Dec 2, 2024

jenkins compile plinux jdk8,jdk21

@pshipton
Copy link
Member Author

pshipton commented Dec 2, 2024

Updated.

@pshipton
Copy link
Member Author

pshipton commented Dec 2, 2024

jenkins compile amac jdk21

@pshipton
Copy link
Member Author

pshipton commented Dec 2, 2024

Created ibmruntimes/openj9-openjdk-jdk#900 and a backport for jdk23. I accidentally pushed the same change to jdk21. We shouldn't remove the constructor until z/OS and valhalla are also updated.

@babsingh babsingh changed the title Remove some synchronization in Reference to avoid deadlock Remove synchronization in Reference and unused methods in ReferenceQueue Dec 3, 2024
@babsingh
Copy link
Contributor

babsingh commented Dec 3, 2024

@pshipton Is the testing from #20694 (comment) still good? or a new set of testing needs to be run for the latest changes?

@pshipton
Copy link
Member Author

pshipton commented Dec 3, 2024

I think the testing is still valid. I restored some missing return statements, but nothing failed as a result of them missing, I don't expect any failure from adding them back. Otherwise the changes are cosmetic.
I'll run an internal build though.
http://vmfarm.rtp.raleigh.ibm.com/build_info.php?build_id=82910
8: https://trssrtp1.fyre.ibm.com/resultSummary?parentId=674f37479ce27a3b4444c8cc
11: https://trssrtp1.fyre.ibm.com/resultSummary?parentId=674f36429ce27a3b4444bb55

@pshipton
Copy link
Member Author

pshipton commented Dec 3, 2024

jenkins compile plinux jdk8,jdk11

@keithc-ca keithc-ca merged commit 0ef098c into eclipse-openj9:master Dec 3, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants